Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tolerate failure #183

Open
wants to merge 7 commits into
base: 4.x
Choose a base branch
from
Open

Conversation

yaauie
Copy link
Contributor

@yaauie yaauie commented Oct 20, 2022

When an Event cannot be created directly from the hit, or when the
docinfo cannot be merged into a non-hash field in the hit, emit an
Event tagged with _elasticsearch_input_failure that contains the
JSON-encoded hit in [event][original] instead of crashing.

Resolves #182


Common causes are:

- When the hit result contains top-level fields that are {logstash-ref}/processing.html#reserved-fields[reserved in Logstash] but do not have the expected shape. Use the <<plugins-{type}s-{plugin}-target>> directive to avoid conflicts with the top-level namespace.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karenzone can I get 👀 on this one to make sure I'm not risking breaking the docs build?

It looks like we added {logstash-ref}/processing.html#reserved-fields in 7.7 (it doesn't exist on 7.6 and before), but is present on all of the versions of the reference available in the quick picker (master, current, 8.5, 8.4, 7.17).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we added {logstash-ref}/processing.html#reserved-fields in 7.7 (it doesn't exist on 7.6 and before), but is present on all of the versions of the reference available in the quick picker (master, current, 8.5, 8.4, 7.17).

The gemlock release file controls which plugin versions get mapped/picked up for each stack version when we run docgen. For that reason, the plugin<->stack versioning generally works itself out without any manual intervention from us.

Copy link
Contributor

@karenzone karenzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor wording tweaks inline for your consideration and addressed versioning question. Otherwise, LGTM

TL;DR. The versioning should work itself out and your linking in spot on.

docs/index.asciidoc Show resolved Hide resolved
docs/index.asciidoc Show resolved Hide resolved

Common causes are:

- When the hit result contains top-level fields that are {logstash-ref}/processing.html#reserved-fields[reserved in Logstash] but do not have the expected shape. Use the <<plugins-{type}s-{plugin}-target>> directive to avoid conflicts with the top-level namespace.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we added {logstash-ref}/processing.html#reserved-fields in 7.7 (it doesn't exist on 7.6 and before), but is present on all of the versions of the reference available in the quick picker (master, current, 8.5, 8.4, 7.17).

The gemlock release file controls which plugin versions get mapped/picked up for each stack version when we run docgen. For that reason, the plugin<->stack versioning generally works itself out without any manual intervention from us.

Copy link
Contributor

@karenzone karenzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signing off on the docs portion in case you're waiting for an approval to move forward.

When SECURE_INTEGRATION is speicified, the (non-secure) `:integration` specs
are excluded, so we cannot have the `:secure_integration` specs wrapped in a
context flagged as `:integration`.
In order for the `ca_trusted_fingerprint` specs to work with the CA's
fingerprint, ES needs to be configured to present a cert chain that
includes the CA.
@yaauie yaauie changed the base branch from main to 4.x December 26, 2024 22:50
@yaauie yaauie closed this Dec 26, 2024
@yaauie yaauie reopened this Dec 26, 2024
When an Event cannot be created directly from the hit, or when the
docinfo cannot be merged into a non-hash field in the hit, emit an
Event tagged with `_elasticsearch_input_failure` that contains the
JSON-encoded hit in `[event][original]` instead of crashing.
# @param scroll_id [String]: a scroll id to resume
# @return [Array(Boolean,String)]: a tuple representing whether the response
#
def process_next_scroll(output_queue, scroll_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cant quite figure out how this is used.

rescue => e
serialized_hit = hit.to_json
logger.warn("Event creation error, original data now in [event][original] field", message: e.message, exception: e.class, data: serialized_hit)
return event_factory.new_event('event' => { 'original' => serialized_hit }, 'tags' => ['_elasticsearch_input_failure'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return event_factory.new_event('event' => { 'original' => serialized_hit }, 'tags' => ['_elasticsearch_input_failure'])
event_factory.new_event('event' => { 'original' => serialized_hit }, 'tags' => ['_elasticsearch_input_failure'])

Nit: Shouldnt need the explicit return, but I see that is not the only instance in this project.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is technically redundant, but present for readability.

The explicit return statement makes clear to me that we are intentionally returning that object from the rescue... block, because otherwise when I'm reading code I don't know whether the return value matters to the caller(s).

It's a side-effect of using exceptions for flow-control (which is less than ideal, but it is what we are already starting with).

set_docinfo_fields(hit, event) if @docinfo

event
rescue => e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
rescue => e
rescue RuntimeError => e

Looks like set_docinfo_fields explicitly raises a RuntimeError now? May be best to be as specific as possible with error catching.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this isn't the only exception we're handling here.

We know that EventFactory#new_event (which delegates to Event::new) can give us exceptions too, specifically when the payload uses reserved fields in incorrect ways. We want to catch any situation in which we fail to create an event from the hit.

cat es.crt ca.crt > es.chain.crt

# output ISO8601 timestamp to file
date -Iseconds > GENERATED_AT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like expiration date would be more useful 🤷 or both.

openssl x509 -enddate -noout -in es.crt | sed 's/notAfter=//'


event
rescue => e
serialized_hit = hit.to_json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming if these were not json serializable in the first place we would never get this far?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ehhh, we deserialized the hit from json, so it should be able to go back?

If we can't, we are at least not creating a new failure scenario (merely failing to handle something that currently already crashes the plugin).

Copy link
Contributor

@donoghuc donoghuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test cleanup looks good (one small suggestion on cert generation script).

Release prep looks correct (version number is good and changelog looks accurate)

Had a couple nit comments and a question about potential unused code. Other than that this looks like a good improvement.

rescue => e
serialized_hit = hit.to_json
logger.warn("Event creation error, original data now in [event][original] field", message: e.message, exception: e.class, data: serialized_hit)
return event_factory.new_event('event' => { 'original' => serialized_hit }, 'tags' => ['_elasticsearch_input_failure'])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is technically redundant, but present for readability.

The explicit return statement makes clear to me that we are intentionally returning that object from the rescue... block, because otherwise when I'm reading code I don't know whether the return value matters to the caller(s).

It's a side-effect of using exceptions for flow-control (which is less than ideal, but it is what we are already starting with).

lib/logstash/inputs/elasticsearch.rb Outdated Show resolved Hide resolved
set_docinfo_fields(hit, event) if @docinfo

event
rescue => e
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this isn't the only exception we're handling here.

We know that EventFactory#new_event (which delegates to Event::new) can give us exceptions too, specifically when the payload uses reserved fields in incorrect ways. We want to catch any situation in which we fail to create an event from the hit.


event
rescue => e
serialized_hit = hit.to_json
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ehhh, we deserialized the hit from json, so it should be able to go back?

If we can't, we are at least not creating a new failure scenario (merely failing to handle something that currently already crashes the plugin).

@yaauie yaauie requested a review from donoghuc December 31, 2024 00:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failure to create an event from the payload can crash the plugin
4 participants